T 



DOCURENT RESUME 



ED 110 *HI2 

iOTHQR ■ 
TITLE 

PUB DATE , 
HPTE ' 

EDRS PRICE 
DESCRIPTORS 



ABSTRACT 



0 / 

SP 009 133 



floiland, Jaass G. ' * . ' 

Variables in Adaptive Decisioi^s in Individualize^d 

Instruction. ' •• ' • 

Bar 75 . , ... 

6»P. ••, ., — V . . 

MF-$0-.76 HC-$3.32 PLUS POSTAGE 

♦Criterv.n, Referenced Te^ts; Curriculua Development; 
Diagnostic Tests; ♦Individualized Instruction; 
Individualized p'rograas; ♦Instructional Materials ' 



This study attempts"^ elucidate soae quantitative ^ 
aeasures to assess the adequacy of ada^^ve decisions in 
individualized aaterials.'^The priaary purin^e of the study is to 
' iaprove the curriculua developer's ability t^^enerate better « 
adaptive aateriarls by iaproving his judgaent of"t|[e quality of the 
diagnostic portions of his aat^ial in aeetin^^-the objectives of 
- adaptive instruction. Three aeasures o'f variables reflecting the 
ratidnalp of, adapting to individual differences are presented. These 
aeasures are: Ja) ratio of teaching . ti%e to total tiae"; (b) 
, predictive validity ratio, and (c) discriainability ratio. The use of 
tHese aeasures are deaonstrated with ^even widely diverse ex^aples of 
adaptj.ve prograas. '.^a^h of the three aeasures yielded a .considerable- 
range of values over the seven progxaas, b^it none of the prograas 
pr'oved adequate' on all three measures of the necessary conditions for 
adaptive decisions- Although adapting instruc^^ion »vit.h prescriptive- 
tests fliay continue tO^ be widely used, there is not yet an empirical, 
basis for that use. (Author/RC) ' • *" 
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The present study attempts to elucidate spme quantitative mea- ' ^ 

sures to assess the adequacy of adaptive decisions in/ individualized 

materials. The primary purpose qf this effSrt is to sharpen the curri- 

'dulum developer's ability to generate b'etter adaptive material by "sharpen- - 

%ng his judgm^ Of the quality of the diagnostic portions of his material 
/' \ ' ' > • . . ' 

in me=eting the objectives of adaptive irfst ruction. Despite thfe heavy ^ , 

emphasis over the past decade on prescribing materiajs adaptive to.indi- 
vidual needs by diagnosing these needs through criterion-referenced tests ^ ; 
(Glaser the principles 'involved in preparing good adaptive materials 

•have been left implicit. * • 

Moreover, those. who decide on tht use of new materials rieed 
descriptive -tools for, determining whether the materials reflect the rationale 
in ad.pling. In the large, carefvjUy controlled field evaluation (Office of 

' I j 

Ecunomi. Opportunity, 1972), Hroducta of ihe emerging theory of u.struc- ; 
tion have tailed to prove the worth claimed in theory for them. Such embar- 
rassing failures are jeopardizing the inr.h.r development of the theory 
Qf instraction.. There is, however, the strong possibility that many, or 



I 



even most, of the materials and procedures, used failed to reflect the theory.. 

\ ^ ' 

pbje\:tive and quarkiUtive measures of the key variables inyolved in prepara- 
tion of educational rrWerials are important to evaluator and developer alike ^ 
if curriculum material^ are to really reflect the, scientific base usually 
claimed foi* cont^mporarV instructional procedures. , , j 

The measures/to be described here are derived from the rationa/e • 

\ ; ' , • ■ ^ /_ 

for adapting and-an attempt to forrnbiktel rather'direct, simple indices of \ 

^ ^ ^ ^ • ' ' I \ 

the important variables. The general strategy adoptecj is similar to thajt fol- 

f — • " ■ I 

"* ' ' ' \ 

lowed in developing the black-out ^atio (Holland; 1967) as a measure for^ 

\ '\ ^ 

programmed teaching items. The resulting measures should Ho for diagnos- 
tic items what the black-out ratio does for teaching items. They should pjro- 
vide a basis for research in^adaptive decisions'' while giving clear guidelines 
to the developer of adaptive materials. Effort was made to develop treasures 
which (1) assess the adequacy of diagnostic items in meeting the aims of 
'adaptation, (2) are simple and easy to apply, (3) discriminate among pro- 
grams which differ in the adequacy of adaptive decisions, and (4) are objec- 
tive in that different persons using the measures will obtain the same results. 

^ Individualization" or "adaptive education" Have become vogue term^ 
which have occasionally been used to describe quite different things (pf* ^ 
Cronbach, 1971). To some the terms connote the unstructured curriculum . 
of the open classroom; and for other.s, they can mean individual choice of . 

' objectives. In this paper, the terms are taken to mean that individual 

* 

diffei^nces in needs are diagnosed in an attempt to present each student* 



with only those teaching nnaterials he needs to reach proficiency in*the * 
terminal objectives of the coi?rse. Hence, the course objectives are the^ . 

same for all students, but the student w)?b is able to pass many diagnostic . 

■ / ■ * 

items skips mUch unnecessary teaching material, while the student who 

misses many -diagnostic items/mu&t,- as a consequence, get additional mater- 
ial sometimes identified as remedinL 

Thus, adaptive materials have two separate conoponentsr* test 

... ^ >- - . ' 

items which diagnose the student's need and teaching materials which fill 

tHat need. In Individ(\ally Prescribed Instruction (IPI) and most other 

adaptive materials ,\ the two different types of itdms are clearly desig^iated 

bx the developer. 

The present paper Is addressed to criteria for diagnostic itf^ms 
used .in adapting, not criteria used for developing the teaching items. There 
is alr'eadj a set ofVeU established principles as to how'teaching materials 
should be designed and by what criteria they might be judged. Tt aching* 
items have a quite difiereut fuoction, and a different, even incompatible, 
basis lur ev . luating their worth that, di'sags-^otic it|m3. Generally, whether , 
i)r not the ts.aching material is in thu familiar formats of early pro- 
grammed thstruction . the principie:, enUyOdiud in its design are programming^ 
.principlfs.' Usually, the sl.udcot I lafek;, toDow some form «f gradual pro- 
gression. (Although in indtvit^aalixed ivaterUls, some effort is made to 
tailor this progression to the individual. ) ludivici-ial teaching item's are _ 
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eicpected to evoke the desired, to-be-learned behavior betore the student 
' re:»ches a cpTrect answer. Thus, the items should provide a low black - 
out ratio (Holland, 1967). It is anticipated that when the student reacTxes a 
particularvlevel, ^^iU be abl6 to give' the required performance since 
his answer insures hfe has performed adequately. He^ce, good teaching 
nfSterial generally has a low error rate. The teaching item does not trap - 
the student into errors or attempt to ^diagnose his deficiencfes. Instead, 
its purpose is to evoke the nevv behavior so that it may be reinforced and 

'established. ' ^ ' . 

By way of co^itrasi, indi\'idualization requires a quite different 
type of item. Test'items serve a diagnostic function. They serv» to dif- 
ferentially Vedict; different performance on a diagnostic item js used 
to recommendjdifferent learning materials. Therefore, considerations in 
test design are appropriate for these diagnostic items. First, 'to be useful, 
a diagnostic item must discriminate among individuals. A icero error-rate 
item would be worthless. A good diagnostic item reveals differences in ., 
performance with some students answering correctly and others making one 
or more types of etrots. . Thus, a good diagposttc item meets criteria 
incompatible with those met by gcfod teaching items. It is for the special^ 
properties of the diagnostic process that new measures are here proposed 
and demonstrated. 



^ ' . The Measures 



Adaptive materials characteristically (1) save the student's time 
mud effort by letting him skip unneeded teaching material, (2) test each 
student to determine his needs, and (3) reflect individual diifferences among 
the* students. These considerations suggest three important measures for 
the merit of a*3a^ptive tests « One reflects the potential savings in thfe stu- 
dent's time compared^svith the cost to, him in time for the diagnosis. Anothe 
reflects the validity of prediction of the need for the learning material, and 
a^hird reflects tKe^ disc rimin ability of the test. ^Siiriple indices of these 
three factors are proptosed in the form of three ratios. The three indices 
will be called consequence ratio, predictive validity. ratio, and discrdmin- 
ability ra^tio. w - • . / 

Consequerfce Ratio , 

Adaptive tests are designed to give the student the teaching mater- 

^ \ ' 

iai he needs without wasting his dine (and patience) with material he does . 

nut need. But testing^^itseif comes at a cost in student time, .It would pe 
• * 

inefficient to speiicf a lot of time testing to enable a student to skip only a 
short teaching sequerlce* Tlje appropriate index is a ratio of teaching time 
to total time. The total time is the ^combination of teaching time and test- 
i-^g time* If a unit of fe'achit\^ ma'teiial requires 30 minutes to complete 

/ ■ 

but is fireceded by a 30 minute pretest that vvould enable a passing sfudent 
to skip the niatjsif'ial, then thf^'co^l to the studuht of being tested is as great 
as the savings he stands to g^^iin b> passing the pretest. The consequence 
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ratio for this test would Ije minute teaching period divided 'by 

the one-hour totil (30 minute teaching plus 3Q minute testing) or , 50. 
ClearlV. no matter what other merits this test may have, it would be 
.unacceptable since the passing student can only break even. If, on the 
other hand, 'a l-minute test could be used to prescribe this^same 30- 
minute teaching unit, the cost is small compared to the potential gain in 
passing and the ioflsequence ratio^is 0. 97. If other necessary conditions . ^ 
are met, this would be a very worthy in stance .of adapting. 

■■7" , • , 

The phra*L consequence jratid" is used to avoid implying any- 
'thing about. the merit of the teaching material that follows. The conse- 
quence of a test being evaluated might eVen include further testing; for 
example, pi^^ement tests place the student units^ which may incite 
" test items that permit. ' looping" past subsect|ons of the unit. - The size 
■ o'f'tlie coqsequence is everything in the catchrhent area under the test in 
quesy^on. It should be apparent that it is the potential consequence which 



is of concern here. That scfme students skip Ih'e 'material does not change 

is, rather, the point. The 



of the cost in time (or amount 
time {the test plus 



the potential consequence of faUing the tcs^; it 
consequelice ratio addresses itself, to the size 
o£ material) saved compared with the tota^ anr|)unt of ti 
the consequence). , 
f^redictive ACalidity Ratio 

* im 

The validity of a diagnostic item is the extent to which the item 
correctly predicts the need or lack pi need for some teaching material 



before a postte^t is taken to measure the same competence* The adequacy 

of sucli a prediction can be measured simply by giving first the diagnostic 

test and^then the criterion (or posttest) without giving any instruction 

between the' two tests. Poor performance on the diagnostic test predicts 
4' " , ' * 

poor performance on the^riteidoD test unless instrjuction i9 received* ^ 

* • . y ' ' ' ' i 

Likewise, good performance on the diagnostic test predicts good perfor- , 

mance on the criterion test even when instruction is omitted. 

^ " \ 

^ In form^this procedure may sound, to the reader like a reliability 

measure because it involves prediction by one test of performance on 

another parallel test. However, because performance on. the criterion 

testis the targeted performance , validity seems conceptually correct* 
X I \ 

Wh^-rtno instruction is provided between tests, the diagnostic^ test 

- - . / 

pre^iicts correctlyTor ''hits'' when comparable material is answered cor- 
rectly both on the initial and subsequent tests or when comparable materia 
is answereld incorrectly on both (see Table 1). A student passing a diag- ' 

' ' -----^ m 

Insert Ijable^l about here ^ ^ 

1 

n6stic test is expected to be. abfle to skip the teaching material and pass the 

1 

criterion test while on^ who fails the diagnostic test needs ^the teaching 
material and without it should Sail the criterion test* Eailt^res of predic- 

V * 

tion, or *'missea,* ' occur when the student is 'correct on the diagnostic 

test but incorrect on the criterion test or incoVrect on the diagnostic test 

but correct on the criterion test* / • 

... ■ ■ V ■/ ■ 



With the test and retest procedur/e, with no intervening instruction^ 
the predictive validity measure is based on the rat^o of hits, to total number 

* ,v » ^ 

of 4eci8ions. If everylkfe who passed a diagnostic test also passed the 
criterion test and all whb failed the diagnostic test also Jailed the criterion 



t^st, then the ratio of number of hits to total number of decisions wouW-be \ 

l.O, If, on the other hand, tosses of a coin were used as thfe t^iagnostic 

test, the^e chance decisions would lead to half or a 0. 5 ratio of hits to 

total number of decisions. Most tests will fall between these two extremes; 

for example, with a ratio 6f 0. 75, one quarte^r of the students either were ^ 

unnecessarily assigned teaching material or directed to skip material 4hey in 

fact need. Such a low V^tlue for the v^lidity^ ratio would presuniably be 

\ ' 

acceptable to the developer or the user only if the consequence were very 
large compared with the time needed to complete the test, Or<iinarily, 
'one should expect Validity ratios close to 0, 90 or better. . . 

■ . . ' \ • 

Discriminability Ratio \ ^ 

In an exploratory effgrt applying the consequence ratio and the pr>BV 
dictive validity ratio to si^everal set^ of curriculum material, the need for 
thi'^i third measure became apparent. There are instances in which virtually. 
' all studeMs answeV diagnostic items correctly and other J in which all 
answer incorrectly. ^In either case, ail students receive the same, pre- 
scripti(jn5;. therefore , ^e-programs are not adaptive because the tests 
detect no individual differences to dc< onur\odate. A simple ratio of those 
passing the diagnostic items could be used; both 0. o' and i: 00 (no one passing 



■ ■ . 9 . 

# - . ,^ ' • 

I 

4 • 

* 'and no one failing, respectively) would represent the e^pctremes in lack of 
discrii^n^bility (0. 50 would be the optirpum). But since the three proposed \ 
measu'res would usually be discussed together, this was rejected to avoid 
havi-ng a 1. 00 as^the ideal for lioth consequence ratio and the validity ratio 
. but as the poorest possible value for the discrimination ratio. Therefore, ^ 
J_ it WES' decided to expf ess the discrimiriabttity measure as the ratio of tjie 
number, who either passed or faiFed,. whichever is smaller,^ to the total 
number taking the'te'st,* The dis cri mi n ability ratio, then, varies from \ 
-0 0 to 0 50, It is-'zero if either all students pass or' all fail; half passing 

' ' ' ^ M ' . ■ 

. would give a I'atio of 0, ?0; on^quarter passing (or one quarter failing) 

would give a ratio of 0* E5, ^ \ • f 



' It is dlear that when there is no discriminability, tbatis, when the 
ratio i'3 0^0, the materials are not adaptive^to individual differences be-^ 



cause the test^ireveals no differences. Beycjnd this, fhere is no absolute 

minimum acceptable value; but a ratio as low as 0. LO^would presumably be 

» 

useful only if the test is highly valid ^nd the conseque-<^ce very large. 

f 

Ordinarily Jf ratio of approximately 0. ^ould be adequate if both validity / 

aod consequence are at^least fairly lar-ge. 

/ - ^ 

Usfc of the Measures 

These three indices are quantitative ineasurei of three variables 
involved in goodness of adaptive decisions. Adequacy "on each of thtf 
variables is a necessary condition tlTrtiee-e the rationale of adaptive instruc-^ 
t'ion. Excellence in any one is not a sufficient condition. .Complete inadequacy 
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for validity, consequence, or. discrimin ability renders the-diagnostic pro- 
Ceduv^ worthless for individualization regardless of the value of the^otfxer 



'two. On the 6ther hand, there are nq fixed, all-purpose values that can 
be regarded as acceptable. The ideals are cl^ar, as are the values indicating 
e^reme failure. Between these extremes experience with the measures 
will be required. 

T 

- It must be clearly understood that these measured do not evaluate 

the overfall usefulness oi any. set of curri<i!alum material. The, technique 
for such evaluation is we^^^own and involves measurement on criterion^ 
measures a^ter vhe students have used prescribed teaching material. The 
present three measures in no way evaluate the teaching material. They 
are, rather, mWsures of the adaptive characteristics of the program ^nd 



not measure-s of\characteristics of the teaching material? Neither are they 
measures of the achievement to be expected from using the materials. It 



is* possible for the adaptive testing to be excellent and for the curriculum 



to fail to teach. It is even possible for the adapting characteristics to be 
^ poor and the overall usefulness of the material to be vt^ry |oud; although, in 
this case, the coverall worth^of the cunj-iculurrf mate ri<tl would probabijr be 
improved by correctiug the dfeficiencies in the diagnostic testing and ^dapt- 
iiig procedures. > ■ - 

V 

> Experim^enta^l Demonstration of the Measures 



The actual use of these measles will demonstrate their utility 

• ■ \ ; . ■ ■ • 

in'revealing to the developer and consumer strengths and weaknesses in the ^ 
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diagnostic and a/dapting procedures. Segments of seven different sets of 
ada!pting materials, wf re eVkuated-uiing these measures in order to deter- ' 

{V --■ ■ • V - 

mine the practicality and usefulness of the "Measures. Instfuctional pro- 

, grams were chosen to cover a variety of adapting styles,. 9hos€0 mM«»aU 

' . ■ / ' ^ ? . ^ 

included two LRDC courses (Science and Istoth), a pf ogram'with large ^ 

loops, one computer -assisted instruction program involving very fir>e 

adju8tme\s, a remedjafmath program with^only An'bysrall placement 

test, one example of a Crowder-^ype "intrinsic '.'"program, ^and a linear 

.program with a binary decision tree enabling initial placerjlent. Slnbe 

only.smalil segments were tested, the results should ngt be^taken as nec- 

-essarilynndicatii^ the quality of the adaptive test material through Ae ' ,^ 

f ' '• 

entire program even though an effort was p:iade to' choose representative ., " 

units. ' / ^ 

Ideally, published programs would include sufficient data^t^sti- 
mate these three indices. Unfortunately, of >he six programs* 

receiving classroom use' none- gave information of any kind op the vali- 

^ • ■■ . ' ■ ' ' " , 

dity of the best items; and d^nly two ga^e data Sufficient to estin!ate the 

discriminability ^f the items\^ These same two gave informaticyi necessary 
to make an informed estimate W the relative amount of time necessary for 
teaching components* and for. testing compbnents (althcfugh all but one did 
give teaching ^le information). The reform .it was necessary to gather 
data sufficient to calculate ihe'thrpe ratios-using students from the 



■ •. . \ \ • u 

„ • -1. 

« 

populations appropriate to each program so far as was practical. The 
techniques of measurenment are described for each of the seven course 
segments. Among the seven a variety of problems are revealed and 
recdfnmendatiQns emerge iltit^rating the val uieasures, 

/ J-'^b C orps Advanced deneral Education Progra!m ^ . 

The Job Corps Advanced General Education. Prograrn (Office of 
Economic Opportunit;^^^968') is a self-instructional program designed for . • 
Job Corpsmen who ha-Ve high school reading skills but who have nqt finiahfed 
high sdhooL The course .coviers everything n,ecessary to^take and pass the^ 
• 'GED test resulting in a high school diploma. ' The course as a whole con- 
' sists of 124 lessons divide^ into three levels. The lessonsoare grouped 
into units consisJ;ing of 2 to-lB lessons per unit, (see Figure 1), 

\- ' ' " 

Insert Figure 1 about here 



Each of the 24 units is preceded by a screening test. A score of 85 percent^ ^ 
or better on the screening test enables the stuci1(5nt to skip aril of the lessons 
in that unit. Unit II-2, a unit with the average number of lessons, was chosen 
as the portion of the program to demonstrate the three indices of adapting 

quality^ , ^ } 

'» This particular program was of interest for several reason^s. First, 
'as shown in Figure ij m has a classic 4dapting structure with unit tests*: 
that enable the studerjt td ' loop' over-imiiie portions of the teaching material 
when test V^rformaoce indicates that hp does not n^S^^^ mate rial. Moreover. 
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this progi^am was known to be of high overall quality. While this 

not imply thiat the adaptive features are good, the tests seemed 

of adequate length to be valid and yet short enough to be efficient, 

<^ 

The six lessons required, according tq the TeacKer^ Maiiual, 
510 minutes. Thus, passing this unit's screening test saves a student 
510 minutes of work. It required only 14 minutes on the' average for 
tke 28 coUege^'sophomores serving aife 3'ubjects to complete the Gcreemng 
tost. Therefore, the consequence ratio .'was ari^excellent 0;-^7. Xhe^ 
lower educationaf ievel of the targeted high school. drop out c^uld make 

« 

'a difference, but even 'with the testing time doubled, the ratio would 

be 0.95. ^ ,^ 

■f ,. ^ •' ' , ■ 

In es^imatingThe v4li<lit'y, these subjects /irst took the screening' 
test and then, witlhout intervening learning material, took the posttest. 
A failure on the screening test forecasts failure on the pos.ttesf if the. 
Student Kas not taken the prescribo4 teaching material. SiniilaTrly, a 
pass on the screening test predicts a pass on the p'bsttesU . A failure 
of prediction would occur if the subject passed the pretest .and failed the 

posttest/or failed the prctost an*d passed the posttest. Table 2 shows 

~ " I _ ♦ 

the ' its and misses in prediction for these data J The predictive validity 

Insert Tabic Z about here ^ 
ratio is 0. 3b. Jn other words, the prt-diction is somewhat less 
accurate than s-imply Hipping a coin to decide whetjier or not the 
student should skip the. unit. This program is totally inadequ.<^e 
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in the- validity of the test. The cause of the problem is apparent on 
examination of .the pretests and posttests. Throughout the 24 screening ^ 
tests in the program, factual information is tested; and, in the case of 
our subjects, none was able to attain a passing score on these factual 
questions. However, the posttests throughout the program (and in the 
GED\test itself)^ test not factual information, but reading compre- 
hension in the designated subject area. ■ '■ 

' All subjects performer identically on the-^jretest; they all failed 
and jfiesumably the high school drop out would do no better. Thus the ' 
diagnostic test is non-discriminating; the discrimination ratio is 0. 0. - 
^cause all students wo^xid b<; given exactly the same prescription 
(i.e., take the six lessons),- this material is not, in fact, individualized. 

The present author still <;onsiders this progran:i to be of high\ 
quality. The teaching material does the job it was designed for and ^gradu- 
ates of the program are kble to pass the QED test, as shown by- the data 
6f the developers and by this author's own direct experience. But the 
quality of the.tea^ching material is not in question here. The prasent - 
Study (.k.dis only with measurement of variabk-d in adapting to individual 
differences. Tf all of the unit screening tests perform as badly as the one 
te'sl.-d in ihis study, then it would be 'Ih-uv that the adaptive feature is not 
dom^ it's job. Although it is eKcelle'nt to have tests that are small and 
manageable in compari'son to the size uf the eonsequence, the pretests seem 
tu'iest for something- different than the terminai behavior reflected in the 



.posttest. There is then no basis to continue to use the'screening tests | 

because of the low validity and'^low discrimination. If the developers had , 

) 

had at their disposal the indices recommended here, corrective measures 
could have been taken. It would certainly seem ill advised not, to adapt a ^ 
curriculum covering the totality of high school; Nevertheless, testing in! 
the present form is~a complete waste of time since the validity index sho^B 
that, at least for unit II-2, prediction is bdlow chance ift accuracy and th,e 
discriminati6n index shows that the pretest faijs completely to discriminate. 

These outcomes, if general over the whole program, suggest two 
recommendations. A revision of this program should include redev^op- 
ment of the screening tests to better predict the terminal behaviors shown in 
the posttest. Second, anyone now using the program should stop using ithe 
screening tests and either give all students the whole course or administer 
the present "posttests ' before the unit as screeninjg tests since the posttests 

and the.GED both measure reading comprehension. ^ { 

/ ) 



• Programmed Reviews of Mathematics (Flexer & Flexer, 1967)^^ 

'f'Jiis is a program in remedial mathematics for college students 
who have had the typical mathematics l?a«^Kground required of enteririg 
college students but who are now beginning a science course requiri^g^ use 
of math. Flexer and Flexer indicate ihkt many students are unprepared 
to handle the machematic^ in a typjlcal basic science course, They'pre- 

/ 

pared six sht)rt, remedi?^! books each' of which can^sually be completed 
in one to three hours • Each book has a placement test to diagnose' the 
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studbnf s need for remedial work in the area of mathematics covered by 

i 

the book. 
♦ i 

' The Flexer and Flexer program is useful for the present study y 

1 

for sUeral reasons. First, the problem they address seems especially 
' like4 to provide important advantages of adapting to individual differences. 
All of the students supposedly have learned all of the mathematics covered, 
but the experience of college science teachers is that a sizable percentage 
of their students lack the basic mathematics necessary for lal3-work. Abraham 
Flexer was motivated by the desire to avoid spending weeks tof class time 
in a biology course teaching math to those students who need it and there- 
' by depriving those who do'not 'of the opportunity to proceed with the intended 

contents of the course. . . ' 

Second, the present author was well acquainted with this program 

■ because it ^vas developed as a project of an organisation directed by him 
<the Harvard Committee on Programmed Instruction). To the author's 
knowledge, Flexer and Flexer were aware of the requirements of good . 
adaptive, test materials. They knew the need for correctly asses.sing the / 
individual s ne(^ as efficiently as possible and the need to discriminate - 
between students who did and did nut need special work in math, in short, 
this program was chosen because it should be exemplary on all three variables. 

P:ach of the six programs is published in a separate booklet and 
includes d considerable amount of data from the several test-runs of 

' the material- at Harvard University and Emmanuel College in biology. 
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chemistry, psychology, and sociology courses. Much of the data is con-- . 
cerned with the teaching material and the gains produced by the course, 
which are, of course, the proper emphasis for program evaluation. They, 
also include ample data on teaching times and-at least enough data to esti- 
mate the consequence ratio for the program as they tested it. Discrimin- 
ination;ratios are also reported as the percentage of each class which 
passed each test item and henc^ was excused from using that portion of the 
program. Unfortunately, they did not test for the .validityof the test items, 

To obtain estimates of validity for the present study, a group of - 
undergraduate psychology students were administer^ the test materials 
for one logarithm unit and for the three fraction units. The tests used were 
not the single items on which the programs were originally evaluated, but 
instead, were the tests provided in the introduction of the books which contained 
■t^Ojn five to eight items per decision for the four decisions evaluated. The 
criterion for a pass' in each case allowed for one incorrect answer in each set. 

Surprisingly, this was not the form of test used in their original 
testing of the program. They had tested the whole class at the beginning 
of the term with a placement test having a single item for each separate 
diagnostic decision. They gave no reason in the published version for chang- 
■ ing from the 'single item 'to "the several item test. Perhaps it was an effort 
to increase the validity or perhaps it was on the advice of the publisher 
who may have felt tjiat a slightly longer test would have better face validity 
and thus be better fc^r .marketing. Nevertheless, it seemed the proper 
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course to apply the measures of the adequacy of adapting to the final 

! 

published long test form. 

This decision led to a serendipitous result* The outcome for 



the longer test is considerably different than for the shorter test. Chjang- 

ing the test in a way that superficially would seem likely to improve 

\ 

it, instead, when empirically-evaluated, is shown to have seriously flawed 



a previously excellent program. 



\ 



In the first effort to apply the mear.ures, X3 students took the 



eight-item pretest for the logarithm unit (the first of three decisions for 
tke logarithm book) and one week later repeateid this test without, of course, 
using the program. Similarly, 10 students took the three pretests for the 

/three parts of the- fractions program (the lengths of these were four, five, 

/ * 

./ 

/ and six items) and retook the tests one week later. For all sets of tests, 

\ 

records were kept of the time required for each student to complete each 
test. In calculating the consequence ratio, 'the published times for the 
programmed materials were used, T^ble 3 and 4 indicates the results 



Ins.ert Tables 3 and 4 about here 



of these evaluations* Both show reasonably high validity ratio's (0* 93 for 

if. \ \ * 

- logarithms and' 0. 83 for fractions) and fair consequence ratios (0*38 for 
logarithms and 0.82 for fractions). Surprisingly, however, the tests for 
bottf g^ogram^ showed poor discrimination. Of the 28 students taking the 
fogarithm test, 26 failed and only 2 passed for a discriminability ratio of 
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only 0. 07. Of the 30 decisions in the frajtions program, ocjjy 5 were 
passed for a discriminability ratio of 0. 17. These discriminability ratios 
were far from the values indicated by Flexer and Flexer for the percentag 



e 



passing the various tests with one item per decision. Unlike the present ^ 
results with the longer tests in which the bulk of the students failed, thley 
found many single item tests were 'passed (62 percent for the fractions 
tests yielding a 0. 38 discrimination index). The combination of the validity 
data measured in the present study which was unavailable in the Flexer 
and 'Flexer data and the consequence ratio and discriminability from 
Flexer and Flexer's data suggest that this program is excellent in its over- 
all adapting characteristics. 'However, the very low discriminability obtained 
in this study indicates that tl^recommended long form of the test has 
largely ruined the adaptive feature of the program. 

To determine whether or not the unexplain^ed change in the tests, had 
this effect, the fraction tests were administered to another set of ten 
psychology students. It was possible to identify a single test item in each 
of the three fractions tests which was like that us^ In the original single 
item test* Using these items, an evaluation was made for both the single 
item and the longer test form with fhe same subjects. It can be seen in 



. ~i 

Insert Table /5 about here 

/ 



Table 5A that the results with the lung form of the tests replicate the / 
first set of data reported in Table 3. For the long tests ^ good,, validity 
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^ratio (0. 83) and a go6d consequence ratio (0. 83) is to little avail in view 

of thie poor discrimination ratio (0. 17) caused by the bulk of the out- 

comes being failures. On the other hand, as shown in^Table 5B, using 

the single -item test^as Flexer and Flexer die originally, provided very 

good discriminability (0, 40), (quite close to their reported 0* 38 ratio) 

with many passes. The single -item test also, of course, increased the 

consequence ratio to an excellent (K'.96. However, as one might expect", 

the short te^t does have a lower validity ratio (0. 73)* 

These results' dramatically illustrate the merit of gathering the 

data needed for t^'ese three indiqes of the goodness of adapting. An originally 

J 

. tested version adapted fairly well to individual differences with excellent 

discriminSbility and a sizable gain for passing^he diagnostic test< Flexer 
♦ 

and Flexer may have had som^e indication that the predictive validity was low, 
although in <pombinal:ion with the good values for the other two variables 
the original form was us^^ul and acceptable. It may have seemed prudent 
and safe to lengthen the test somewhat to incrcai^e* its validity. Surprisingly, 
this created a serious deficiency in th^ program which apparently went 
undetected. An adequate level is necessary on all three measures. Each , 

is necessary* no two of them sufficient. Moreover, ,a step taken tu iiriprove 
one couid ..aube deterioration in another. Lengthening a test could reason- 
ably be expected to improve wxlidity, but it ^yill aldo take more student 
time a!Kl a high vaiirliiy car, be <>'>ldi:icd by using pre-and postte^sts which 

I 

almost a^l will fail (or everyone ^ill pass), but such a test has low diijcrimin- 
ability. 
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LRDC's In dividualized Mathematics 

The classic prograrti in tjhe field of individualized instruction is / 
LRDC'S Individually Prescribed Instruction in Mathematics, or IPI Math 
^(Lindvall Jc Bolvin, 1967). This set of curriculum materials has served , 
as a prototype for individaally prescribing instructional units through 
diagnostic testing,. The program, contains 359 instructional objectives from 
10 learning areas. The objectives are subdivided into 7 graduated levels 

of difficulty called A through G, corresponding roughly to a conventional 

/ 

kindergarten through sixth grade math curriculum. There is at each level 

/ ' ' ^ * ' I 

a placemeift^test which diagnoses the need for e^ch unit appropriate to that 

letveL The placement test indicates which units may be skipped, and which 

units the student should be pretested on. The pretest for each unit identi- 

fies more specifically within the area which lessons or "skill booklets" 

the student should use. The skill booklets cont.ain the educational material, 

but they' also have additional testing material, the cul-riculum-embedded 

tests (CET's). ThesV diagnose the subject's mastery of that lesson and in- e 

dicate his" readiness to take still another test, the unit posttest. 

Besides the prototypical nature of the IPI Mjith program, a second 

important consideration suggested its use in the present study. *The layer- 

ing of different tests presents interesting problems in evaluating, separately 

and in combination, the various elements of a compound diagnostic system. 

For example, the consequence of passing one item on a placement test is , 

to skip not only all teaching material in the catchment area under that item, 
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but to skip pretest\aterials^, curriculum-embedded tests and pcsttest ^ 
materials, as well. Hence, placement test items may have a large 
consequence, 'although much oi the consequence is additional testing. On 
the other hand, to evaluate the overall testing structure, the total set of teits 
could be set apart from the teaching material to determine the cost effective- 
ness of the total testing structure for a given unit of teaching. 

A unit of level B was chosen for analysis. Twenty-two children 
from a. second grade urban classroom were routinely given the placement 
test at the beginning of the school .year in, September. In this instance, 
'data were collected on the time required to do the test (33 minutes to 2 
hours, with a median of 1 hour, 20 minutes). The questions for each of J 
the 10 areas were'separateiy analyzed, and the m't^ltipUaation unit was 
chosen for further analysis based on the high degrep of disc rimin ability 
shown by the placement test items. (Eleven students passed and eleven 
failed this uqit for a 0. 60 discriminability ratio. ) It should be noted that^ 
although the most discriminating portion of the placem'ent test was chosen, 
the average discriminability for all units represented on the placement test 
was rather good, with an overall index of 0. 3 5. 

Xll students were then given three test packages, the pretests, the 
CET's, and the posttests, for a!] four skills in the multiplication unit. 
Thes3 tests were administered beiore any additional-learning material ^ • 
and cegardless of whether they had passed or failed the multiplication 
items in the placement test. A comparison among these tests prevides the 



index of validity for the tests. Completion times were measured for each 

of the tests,' Later, when the children camie to'the designated units 'in tlie^ 

* curriculum, estimates were made of the teaching time. Tte teaching 

time for each child was based on the number of days spent: working on 

these math \inits and the length of t^e scheduled daily math time. Although 

it might be argued that this is* a realistic way to* measure the time because 

it is the way the matferial is used, potential distractions in classroom use 

make it crude. For this reason; an additaon^il method was used to calculate 

9 . •> . 

''time''. Since test items and teaching items in IPI Math are similar in 

forfn, content, and length, ' timje" was estimated simp^by counting the *• 

number of items used for teacfiing and^the number otKema sm each of the 

) - 1 ^ • . 

various testa. The results of these two methods of estimating the consequence 

.V 

and costs of testing corresponded closely. 

The variables. in adapting decisions can be iVi'easured separately ; 

. ^- ' ■ i 

for each different testier fo3: combinations of the tests. Four separate ' 
evaluations seem particularly of interest: \ 

(IJ evaluation of the placement test in terms of the savings in 
passing the test /or the student who would, by passing, escape all of -the ^ 

/ ^ • . " . ^ . ^^j. < 

work under the/ catchment for given items in the placement test, including 

/ 

the teaching tiiateria^, pretest, CET's and posttest. 

(2) evaluation of the placement te#t anffHhe pretest together in 
tefmrof the validity of the two tests combined and of all the consequences 
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below the combination of the two tests, namely, the teaching material, / 

\ 

CET's and posttest, • ^ , 

(3) evaluation of the pr^etest alon^, ' ^ 

(4) evajluation of all tests in combination* 

Placement test alone. There are only fiw placement test items 
-'"^ - 1 —1 J ^ 

diagnosing the need for Jiie multiplication unit*# The Combination of all the 

* . ^ t) 

consequences of failing these five items, including pretest, teaching material, 

!•.,-■ - ■ - { 

CET's and posttest, is 375 ilfem$; giving a consequence r^tio of 0.99. -"^^Sl^, 
the savingi.to the student of passing the pla^cement test are considerable; the 
savings are not simply; that he skips teaching mat^x^l that h€f skips 

additional test material as well; 

VaUtiity of the placement test itends was evaluated by cortiparing 
how-^well the placement test predictedlj^erformance on the CET's for each 
of the four skills in the multiplication unit. The frequency of hits and 
misses is indicated in Table 6, Passing the multiplication items on the 



\ Insert Table 6 about here 

^ / 

placement lest enables the student to skip all four multiplicatidn skills. 
Hence, four ''decisions" are Vnade. With CET's from the four skills for 
22 students, there were,58 iiits^ut of a total of 88 predictions for a* ratio 
of 0,66. As indicated earlier, about half of the placement decisions were 
passes and half failures for a perfect discriminability ratio of 0. 50. 
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Placement plus pretests^ The next question to be considered is 



the use of the placement 'test and the pretest in combination to predict whether 
the individual lessons are required. With the combination cf the two tests ^ 



/ 



discriminability remaips high with a ratio of 0. 40 (see Table 7) and the j 
addition of the prefest lowers the excellent consequence ratio only slightlw ' 
to 0.93, with a consequence of teaching matex*ial, C^lT's asd pqsttestsjM a 
cost of placement test and pretests. ■ ' ^ 

■ -■- : 

Insert Table 7 about here>^ • . , - ' . / 

" -\ V • 

Table 7 presents the possible^ombihations of passes and failures 
on the placement. test and the pretest. For each of the combinations, a ^^.^ 
prescription is derived, and this prescription is evaluated as a hit or a miss^ 
based on the possible outcomes on the criterion test. In . tile last column of 
Table^7, the data obtained from our test subjects are presented accdrding ^ 
to outcomes on the'piacement test, the pretest and the criterion test. The 
predictive validity depends on the performance of each student on the com- 
bination of the two tests. In normal use of the materials, a Btudent who 
passes the placement test will not r.eceive'the pretest, since^he'is passed out 
of the mu>tiplication unit, Therefbre, in this study, in whlc^ subjects take ^ 

all the tests and no teaching material, if the placement test is passed ■ 

■) 

predictive validity is calculated without regard to pretest outcome. As shown 

/ * / - 

in .Table T, if the placement test is passed, there is a ''hit" if the CET's are 

also passed, and a ' miss" if the CET's are failed regardless o^f whether* 
the^ pretest was passed or failed; 



Normally,* st' : its who fail the placement t^st ftten.take the p^e- 

^ - * 1 

test, and if they ^Jso fail a giv^p pretest, ^they must use that skill booklet;^ 

in othftr words it is predicted that without the teachirtg material they would 

, * * ** 

' "■ ^ 

'fail the GET 's. . ' 

If a test subject fails both the pdaceme^t test and the p'retest, the 
resulting .decision is ^ "hit" ifjhe also fails the CET's and a "miss" if he 
passes the OElC's. * it, on the other hand, subject fails the placement . - . 
test but then passes th€ pretest, the prediction is that he sho'ut'd'^pass the 

CET's since his clae^^oom counterpart using'the teaching material would. 

- , *' * ^ * 

n9t be prescribed the ^nit. Tho^e test ^bjects taking all the tests but 
given no teaching materials who- fail the pjltceme|pt test aqd ^af s the pre- 
te^ yielS^hit^" of prediction if they pass tUe GET and "misses" if they * 

.fail th'c CET's. " \ \^ 

The resdljjs of this double level of testing are-surprisin|fr Using 

■ • ' . "■ 

the two tests in copibination gave only 59 hits out of B8 decisions, for a ratio 

of 0.67* The combination of the placement and the pretest in this instance 

provides a 'negligible improvement *>f only i hit in prediction over us.e of the 

placumeqt test alone. Neither the plaeemont test alone, nor the combina- , 

-ion of the placement test and the pretes/ provide^J a very adequate prediction 

when chani^e assignment would gi/e 50% hifo*. 

Pt^etests aJotie, Next the^^pretual alone was i-valuated as though 
* ' — — ' • * ' , 

there were .no placement.tesl. The validity 6t the pretest was evaluated 
against tWq different criterion tests, the CET'S and tlje posttest. The 
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results of evaluating pretest validity again sj: the two "tests' are given in 

Table 8, Almost identical validity ratios, 0.86 and.O* 85 were obtained; 

these provide useful levels of validity. Moreover, with ^^of the students 

passing the^retest, the discriminability of the test is good. The consequen 

ratio for;khe pretest alone is^* 94; but the consequence in this instance in- 

eludes not only^the teaching material but the CET's and.the posttests. 

f' ■ N 

Insert Table 8 about here . ' 



Overview: All tests in IPI math. . Either alone or in combina- 

,tion the placement testrand the pre>fests give, good consequence ratios. 

However, there is considerably more testing in IPI math. The CET's and 
"■ • * 

posttests are part of tlie consequence of failing the placement test or the 

/ ' 

ptetests. But with the complete program the question becomes: ^ What is 
the ratio of the consequent teaching, time alone compared to the total time 
for teaching plus' all testing? This consequence ratio, evaluating the 
totality of. the testing , is- 289 teaching items ^ompa^ed to a total of 379 
items for teaching and all' testing, for a ratio of 0. 76. Since the break- 
even point on testing a'nd'teaching is O. 50, this ratio of 0. 76 id rather 
disappointing. Individually the tests in IPI Math have satisfactorily large 
..consequences but, in combination, testing is overdone. 

* ' ^he CET's and the posttest are of little use in diagnosing indivi- 
dualized decisions. Ttey £olio.w the teaching material and, unless the teach 
ing material,is inadequate, these items should* be very pooV discriminators 



since most should be answered correctly. Ho\yever. some form of test- 
ing after teaching is no doubt needed to avoid misuse of the teaching 
material in a classroom situation. It is doubtful, however, that both the 
CETs and t^e postteats are needed. 

It is especially interesting that the combination of placement and ^ 
pretest shows no improvement in validity over the placement test alone and; 
for Ihe millti plication unit in level B at least, the pretest alone is the more 
I valid test. Anyone revising IPI Math, or attempting a similar curriculum, 
should avoid the layering of test upon test. Given two tests of known validity 
the higher validity test should be used. Validities are not additive. Witli the 
combined placement and pretest procedure a pasl on either test prescribes 
a skip for the unit, thus the lower vaHdity test degrades the prediction. 
The false skips of the two tests combine to lower the vali/iity below that of 

I Ik 

the more valid test. 

Users of the present IPI Math would be well advised not to use all 
! of the tests. It is interesting in this regard that Leinhardt (1974) fou|id for. 

\ the IPI Math course a negative correlation between the amount of testing 

\ done by various teachers and the student achie'/ement at the end of the 
Q school year. Knowing the overall relative validity of the placement and 

pretest would hel^in choosing betwt.en pretests and placement-^estis if only 
one were to Le used. If diagnosihg the student s needs is the only considera- 
tion, it. would seem reasonable to uso prcCestd only, abandoning the place- 
' ment test, CST's aryd posttests. But the most satisfa<rtory solution would 
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be a new effort by the developers of IPI Math to revise their tests to gain 
validity greater than that of thei present pretest while greatly lowering . 
the overall burden of testing. 

InrUvidualized Science . . , 

The Individualized Science Curriculum (IS) (Klopfer,, Champagne, & 
Pittman, 1972) is another well-known LRDC individualized program con- 
sidered'by this author to be of high overall qaality. IS of special interest 
because, although it is individualized, diagnostic testing is much de -emphasized 
as compared with IPI Math. The only tests normally needed in Level B, . ^ 
for example, are the unit pretests. Passing ^ given section of a unit pre- 

test permits the student to skip that lesson irt the unit. 
. / 
In tiiis-study one unit testllThe Hooke Unit for ijevel B) was used 

with ten students from *a school using IS. Since there js no posttest. predictive • 

validity was measured by administering the pretest tWice^and predicting 

test performance of the' second testing from the first ^testing. As always ., 

I 

in these vaUdity checks the teaching materials were not used between thetwo 
administrations o£ the test. The subjects were first testetMn the summer a 
^ew weeks before the start of school and again about' two months later vVhen 
they took the unit pretest as a regular puft of their 'classroom activity. 
Good performance on the Hooke pretest could exenipt a student f^m six 
lessons, but tvvo of these lessons were under the catchment of the same 
pretest questions. Therefore, five individual decisions were evaluated for 
each of the ten subjects. The six lessons required an average of H 6 minutes 
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and the pretests r'equired an average of 1 1 minutes, giving an 

excellent consequence ratio of 0.91. The validity ratio was 0.98, reflect^ 

I 

log the 49 hits and 1 mi^s summarized in^the hit-miss ch^t in table 9. 
Table 9 indicates that the high predictive validity results f rdm a:ll fifty 
test decision? being failures on the first testing and all but one being 
failures on the second testing. Thus, the test failed completely to dis- 

• . . , . ' ■ ■ 

criminate. « 

^ 

Insert Table 9 about here * 



Since this unit pretest lacks discriminstbility, the material' does not 
adapt to diagnosed individual differences* Since all fail the pretest, aU 
subjects would have received identical prescriptipns. Despite a very high 
consequence ratio and a very high validity, the adapting procedure is 
inadequate because it fails to discrimitiate. Adequate values for each of 
the three variables- are necessary for a,n adapting system; none is sufficient 
alone If this failure of ^discrimination is characteristic of the whole science 
program, the user should abandon pretesting and either use it as a linear 
program or choose lessons based on student interest or teacher objectives. 

The developers seem "to have concentrated on the teaching material. 
In doing so, they produced an interesting and useful science curriculum. 
The failure of the diagnostic procedure, even if characteristic of the whole 
curriculum* does not negate the vaJye of IS as teaching material. 
There is no empirical, logical, or compelling intuitive reason to believe that 
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diagnostic testing and individual prescriptions will be particularly useful 

in all areas of instruction. Possibly the develo|^e^ of IS had doub^ about 

the importance of adapting in this curriculum and de-emphasized it. But 

without discriminating tests their exciting instructional material would be 

improved by dropping testing altogether. Otherwise, discriminating tests 

are needed. •s,^ 

Inductive Reasoning Program ) 

The inductive reasoning program is a 256-item linear program 

.constructed as an experimental demonstration of the teaching of a basic 

aptitude- -Thurston's reasoning 'factpr (Holland, 1962). Program items 

consisted of a row of '^bottle-shaped'* objects varying in color and direction 

which were arranged to provide patterns. The student picked from among 

five alternatives the object which would be next if the pattern were extended. 

For an experiment on evaluation of branching effectiveness (Holland, 

Hoffman k Doran, 1972)', a binary search sequence of items was added to 

•provide a maximally efficient way to place a student at his prop^ beginning 

point in the otherwise linear sequence. The binary search procedure placed 

students by beginning with the middle item of the program and bisecting - 

distances forward or backward after correct or incorrect responses. \ 

After the seventh choice, ^the student was considered to ^e at his correct 

beginning point (See Figure 2 £ot a graphic representation of the binary ' 

search procedure). AH dati needed to estimate consequence, validity, 

. r- 

Insert Figure 2 about her0 
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and discriminability ratios are available from the study. For evaluating 
whether each individual decision in the branching sequence was a hit or a , 
miss, similar program items in a pretest used in 1|iat study served as a 
criterion test. 



Insert Table 10 about here 



The consequence ratio for the seven-item test with a 2S6-»item 
consequence is 0. 97. With eleven subjects and seven decisions each, 
the total number of decisions in the program was 77. With twenty-five 
failures and fifty-two passes, the overall test is discriminating. However, 
only thirty-six of the total number of decisions were hitsrjas compared to 
forty-one misses for a validity ratio of 0, 47 (see Table 10). Thu^ , while 
the branching tree might be an efficient usfe of testing time, the use of 
single multiple -choice items failed to be better than a flip of the coin so 
far as validity was concerned. 

This program illustrates the problem of test size. A short test may 

be very cjffic ient in terms of time, but short tests tend to be less vali^. 

increasing test length can increase validity/but longer test decreases 

the con:>equence ratio. Tho inverse, heiationship between test length and 

% 

adequacy of adaptive tabling poses a special dilemma for the curriculum ^ 
developer. 

c 

A Tutor Text Pro^rt^m 

''' * * ^ 

Especially popular a decade ago, t^r texts have been prepared in a 
wide variety of topics from bridge to electronics. Often they were intended 
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for popular consumption and sold in the trade market "but some were 
prepared for college courses and technical training. In these programs 
each page has some material to be read followed by a mialfeipie -choice item 
with two to four choices. Each potential chcoice directs the student to a 
Particular "next" page which itself has some material to read and other 
/multiple -choice items. A correct answer usually keeps the student on the 
mainline items while incorrect answers loop him through remedial mater- 
ial bf one or a few items and eventually back, to the mainline items. 

A representative of this type ^program was included in the present - 
analysis for several reasons. First, it would be unthinkable not to^ave 
an exdmple of Crowder's '"intrinsic" programming technique which for some 
years was the leading example of adapting to each student's special needs. 
The approach is also of interest because it represents a rather fine-grain 
approach to teaching and testing. A single page, and often much less than 
a single page, includes one uoit of teaching material and one test item. 
Hence, the student is diagnosed as to his ability to handle the next small 

V 

mainlin^7ite>-or his nepd for a short remedial loop. In addition, this is 
an instance in which the teaching mattlTrial requires no overt responding 

♦ 4 

by the subject. All answers are to the diagnostic portions of^the material. 

Since teaching and diagnostic materi&l are so closely intertwined, it might 

seem to the casual observer that intrinsic programs c^ violate the assump 

tion that teaching and testing Jan be separately identified. However, this 

intertwining offers no problem in practice and it would seem to offer no - 
t . 
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problem in theory in view of Crowder's description: 

Intrinsic programming assumes that the basic learn- 
it^g takes place during.the^udent's exposure to the 
new material. The multiple -choice (^estion is 
asked to find out whether the student Has learned; 
' it is not necessarily regarded as playing an active 

part in the primary learning process. (Crowder, 1962, 

P- 3) . ! • • . 

An evaluation was made of the test elements in the seven mainline 

items in Chapter 5 of A tutor text on the arith metic of computers, (Crowder, 
1960). Testing for validity and discriminability required us^of a test - 
retest procedure because tl^ere was no other appropriate cril^rion test. 
The test elements are spread rtirough the program making it 
necessary for each of the nine college students serving as subjects to use 
the material leading up to each test element before atr/wering the test 
item. Afte\ the first testing for eaoh item, six hours to one day elapsed before 
' the retest for that item and the use of the prescribed teaching material 
whfch included the. next mainline itjm apd the first testing for the nex^item. 
The cycle continued in this' form until all seven ipainflne test elemetiP were 
completed-. The reader is reminded that no teaching material was taken 
between the test and retest for each element.. The reason teaching material 
Jas ne.es^ciry in ^Ws case, unlike any others in this study. was that the test _ 
elements were intended to predict the student's next need given exposure to 
the pi-c-.-cding teaching material whilh was itb^elf preparatory forthe test 



itfcm 



/ 



Insert Teble II about here 
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The outcomes of the three indices are summarized in Table U. 
The bulk of the test elements was passed on both test a'nd retest giving 
a high 0. 97 validity ratio. The retest procedure no doubt exaggerates 
vaUdity as compared with use of a parallel test, but in this instance this 
vaUdity problem is overshadowed by a discrimination problem* The 
discriminability ratio was an ^satisfactory 0. 12 because usually the correct 
choice was made and consequently the mainline item prescribed. Used in 
the designated way, remedial pages wo61d be used by few, if any, students, 
paper shortage may correct this problem even if quantitative evaluation 

is ignored. ^ 

In failing a test element the student getsSnie or another remedial 
item and is then sent bilck through the mainline iieVn again. Therefore, 
the consequence ratio is the average time for a single route through, includ- 
ing retaking the mainline teaching and testing-. The consequence ratio is 
0.75 which is very poor especially since part of the consequence is additional 
testing. With the extra testing removed, the ratio is only 0. 65 merely 0. 15 
above the level in which test and consequence equal each other. 

The pdor con'i^uence ratio seems enden^ic to the tutor text format 
With, a little teaching "and a little testing on each page. Moreover, a combip- 
ation of a-dequate validity and discriminability is unlikely with one multiple-^- 
choice item. In this instance the Ijigh validity in this program resulted from 
the low discriminability, in that errors were so infrequent on either testing. . 
On the other hand, if items are written so more students fail (providing 
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better discriminability) chaace choices would'appear in the multiple choice 
format and With this guessing, validity would suffer. Considered as adap- 
tive material, there is little to fecommend Crowder's intrinsic program- 



ming. Nevertheless, thes« programs sometimes ard fun. People even, 

enjoy peeking at the error loops for errors they did not make. If so, these ,j 

programs may be useful, but not because theiy are adaptive to individiml 

differences. 

Stanford CM Reading 

Atkinson's beginning reading program is one of the better known 
model programs in CAI. His article describing it and rep-brting data 
(Atkinson, 1968) added momentum 'to the use of computers for a fine-grained 
adaptation of teaching material to student needs. This program was chosen 
here to represent the extreme in detailed adapting. This is one of the frequently 
claimed possibilities offered by the use of computers.^ To explore the 
applicability of .the present measures in this type of adapting it seemed 
reasonable to choose from among the most respected of models of CAI. 

In the Atkinson reading program the child views a cathode ray tube 
which presents letters and worcfjs^A random-actess audio device presents 
messages and the student places a fight pen on the -screen to indicate 'his 
. choice among alternative ahswers presented by the catWe ray tube. .Onfe 

* 

of three basic, forma, '.matrix construction, " was used in this analysis. 
This form is the program s-key format for teaching decoding of graphemes 
to phonemes. In a typical mainline item the child is prfesented a It^tter {''r") 



^7 

' ■ . \ 

to the left cf an empty cell and a vowel-consonant ending ("an") above t%e 
cell. Below th6 cell are four alternative words ("rat^ "bat", "fan", and 
••ran"). He hears the autom,ated fliessage "touch and say the word that 
belongs\n the empty cell. " Jhis mainline item is diagnostic, according 
to Atkinson, since "it is designed to id^tify three possible types of errors: 
(1) The initial unit is correct, butthe final unit is not ("rat'^). (2) The 
final unit is correct, but the initial unit is not ("fan"); (3) Neither the initial . 
unit nor the final unit is correctly identified ("bat")" (Atkinson, 1968, p. 228). 
For' either of the first two errors the student gets a single frame whicK^ 
trains either the initial or final consonant and for the third type of error he 
receives both corrective frames. After any corrective frame the mainline 
item is repeated. After a correct choice (^'ran!') the student gets a confir- 
mation frame and the next mainline item. 

Enough data were presented to Calculate' consequence ratios and 
discriminability, but not enough t6 determine validity; although Atkinson's 
data for the percentage of each type of error does permit corroboration of 
the implications deriving from the validity determination made for this ; 
study. Atkinson indicated that the response rate was about four per min- , 
ute for mainline and corrective items alike. ■ The consequence of the first 
two types of errors is one corrective item and a repeat of the mainline 
item for a ratio of 0. 67 for two types of error. The consequence for the 
tinrd type of error (e. g. , ^bat" for "ran") is two corrective items and a 
repfeat of the mainline item for a ratio of 0. 75. Thus the|average of the 
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three possibilities is 0,70, This is a disappointing validity ratio for a 
program sparking a multi-million dollar CAI movement. 

To determine validity, nine mainline items were prepared for the 
matrix problem described in Atkinson's (1968) paper. Thes^ were pre- 
pared on sheets of typing paper in large letters as they appeared on the ^ 
CRT. A group of nine students at the appropriate level ip reading were 
presented^ these items one -by^- one by their teacher who spoke the appft>pri%te 
audio message. No corrective items, were u.s^d; only the nine mainline * 

y ' ----- 

items were used to diaenose trouble with the initial,; final, orSoth conso- 
^ : . 

* \ , ' 

nants* The child touched the alternative with his finger and the teacher , 
recorded the response. After comg^leting the nine items the students were 
immediately retested with ^line similar items that had the same words, ^ 
but a 4ifferent random arrangem&nt of the response alternatives. 
Care was taken to use children who would approximate the excellent dis- 

t 

crimi nubility obtained by Atkinson, He had 45% correct on inijial contact 
';vith the mainline items.' Subjects in the .present study were 37% correct on 
the diagnostic testing (see Table 12). For the nine items and nine children 
there were a total of 81 decisions with 55^ hits and 26 misses for a disappointing 
validity ratio of 0.*;68 or only 18 better than chance. * ' - * 

rv---. 

Insert Table 12 about here 

t ^ __ . 

,it is revealing to consider the i7*hits which were instances of in/ 
correct answers on both test and -retest. The mainline item not only ^ 



diagnoso^ that remedial materiaH^ nceMed, it determines whether train-* 
ins is neided on initial^ final, or both consonants depending on the alternative 

. " ^' ' . ' 

chosen. Thus, when an error is naade, it is supposed tt> indicate which, 
particular alternative treal^ment is needed. But in this test-retest validity 
check did the students make the same errors on both testing* if they fail- ^ 
both^ No,- of the 17 ' hits'' by double failures, 5 picked the same alterna- ^ 
• tive and 12 picked one of the other two alternatives. Apparently, sorde 
children can and do read these words accounting: for^the above^chance 
portion of the passes, but if they do not read, they just pick one regard- • 



\ 



misses of predictipo is 



less of the letters. If the criterion for hits andl^ 
whether or not the same alternative is chosen' rather then simply passing ^ 
or failing, then the validity. ratio is much lower With 43 hits and-38 misses 



for a ratio of 0. 53 with chance nojw being 0.25 for one four Wy prediction.. ^ 

* * T^iia style of program seems beyond r'edemption. There appears ' 
to be no way to save the basic concept of the prescriptive aspect of this 
program. A single multiple -choice item predicting several alternatives , 
intrinsically has problems with validity, unless the item is made unfailable . 
which sdcriuces discriminabi]ity for the gain in validity. In addition, using 
one\em to predict the ne^d for only Lwc/or three other items hardly provides- 
adequat*; adVantage to the stud(rtit even if validity of prediction were perfect. 

Should the program be used at all ? Students using this course did 
bettei- than a control gro«{f) on slaftdardi/.ed reading "tests. What this-means 
is not clear. First, 'control' /roups f rum, so -called^tanda^cT^ ass rooms 



are not aSequate bases for evaluation- as Lumsddine (1965) pointedly 



shdwfl. It is possible^ moreover, that the sizable exposure to the multiple- 
choice format gave the studentg some -advantage op the standardized tests • 
foUowi^ this format. It is also likely that the" peppy pace (four frames 
per minute) provided b|r good instrumentatioo exposed the kids^to more ^ 
tnaterial than the doldrums of 'a paper, pencil and blackboard method-- 
but surely man should be' able*to give some help to the computer. If - 
someone had the computer and the course,it \dould seem to do no great 
harm* to use theiri since the technologically iq^dequate '^control" \8S 
w«is not as good as the technologically inadequate ^CAI prograrrj^. 

It might be argued that the mainline item^ Jtre principally teaching 

^ratheY than diagnostic in function., The description of th^ purpose of these / 

' ' * *" y , * ' ^ * ' ' ' - ► I. 

i» ' ' ' . ' ' ^ 

items, quoted earlier, wauld seem unequivocal in proclaiming their diagnpstic 

function. But even if these items were considered teachfiag items,, their one- 

strength as test items, high discriminability, becomes a \yeakness. . In 

teaching items, a high error rate indicates that the desired behavior has ^ 

not occurred. Tljere is a fundsunental incompatibility of the two functions. 

Teaching items must gei.the students'* txy cb^someihing new; testing items 

'must detecLthe inability of a fair number, of students to do it. Ail things 

considered, tKere is little to gomm d in fhis style of CAI program. 
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Sumnoary 

Summary of Demonstrations 

For instruction to be adapti^lfe, a raoge of validly measured dif- 
ferences among students must-be actommodated by ex>^osing th^ indivi- 
dual student only to those materials r\e needs. Do simple measures of the 

1 ^ 

three aspects of adaptive instruction described and illustrated above permit 
user, developer, or evaluator u easily deterrnjne the adequacy of the 
adaptive features of almost any instructional material in xneeting the 
' assumptions involved in adapting? The^use of the measures of cost effect- 
iveness, validity, and discriminability were demonstrated with segments 
of seven different adaptive courses ranging from- fine-grain adapting in 
CAI to pre-coursej)lacei>ient testing. All of the materials tested attempt 
tQ^^pply modern ins^x^^ctionartheor y and most are well-known published 

courses. ; % 

Surprisingly none ki the couorse segments tested proved a4equate % 
oil aU three nieasures. But in every case the application of th^e three 
"liieasiircb pr(-mpted concrete suggestions as to what steps should be taken 
regarding tii^- specifk^set of course material. One course, the Programmed 
Reyitiv^vh . i ^!dth€-mat^cs (Flexor and Flexer, lOb?) lacked adequate discrim- 
Hiability \u published form but perforoied well on all three measures when 
tosted\vUh th<; shorter test used by its a'lthojTs in their pre-publication testinj 
Hie us,! of tlio suggested measures offers an easy remedy- -use the original 
fc/rm of testing. 
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^ Similarly, application's of tl^e measures prompt suggestions for 
remedies in other programs. The IPI math needs much less use of tests 
and improved validity* The Job Corps program for GED preparation requires 
n|Bw screening, tests correcting the pres'ent lack of both discrirninability 
and validity. In fact, most programs require cycles of test and revision 
to empirically increase the validity and discrirninability of the tests w^ile t 
rrfaintaining as sliort a test as possible. / ^ 

^On the other hand, in pne instance of an excellent set of teaching 
'^liaterial (Individualized Science ), the complete absence of dijcrimin ability 
seems to be simply inheretit in the n^ature of the course content. Children 
will seldom be adequately |)rbficient in any unit and therefore would probably 
fail any valid pretest, Thus the course would be impi^oved by committing 
the heresy of not attemptingfVo adapt to individual differences in^ proficiency. 
Examples of more fine-grained adapting ( Tutor Text, and C AX Read - 
ing) do not seem to be capable of cor;rciCtion. The single-item multiple 
choice test seems unsuitable for gaining yalidity without sacrificing discrim-- 

5 

inability and th^ consequence of passing each test is too smaft to allow 
an increase in numbei:.of test items for each decision* 
Distribu tion of Problems . * ^ - " 

Over half of the pr^grah^i segments had poor.discriminability, with 
two being^ completely indiscrimin^ate so thai aJl students would normally 
be prescribed the name units.. ' • • . - 



f 

/• 

Four of the seven segi^ients were unsiatisatisfactory in \calidity 
of the tests. The diagnostit tests did not correlate with the crj^rion tests 
when iio teaching intervened. Therefore, many students would either be 
prescribed teaching material not needed^o pass the criterion test or be 
allowe<ito skip teaching material even though they would not be able to 
/pass the criterion test without it. Two of these four program segments 
actually diagnosed no better than a chance jflip of the coin would have. 

Three of th,e seven course segmentis were inadequate in cost-effective- 

& 

ness to the student in that the potential saving in time for passing the test 
was too small given the amount of time required for the tea«;hing or other 
consequent material. One of these nearly required as long in testing as 

V 

would be required for the teaching material should it be prescribed. 

While no program was adequate in all three Measures, none was 
inadequate in all three either. The reason is probably that concentration 
by the developer on meeting one requirement can easily cause the sacrifice 
of another. L.ong tests are more valid but less-cost efficient and a guessing 
game gives high di"*criminability but poor validity. 
Implication: Is Adaptive Instructi6n a Myth ? 

This author believes, along with the vast majority who conduct 
research and development in educational technology, that, at least in 
some curricuiar areas, different st,idef)ts have different needs in reaching 
a given educatio'nal goal and that adaptive instruction will, therefore, 
be useful. Buf one looks in vain for compelling empirical support for this 
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this proposition. Webster's Seventh New Collegiate Dictionary defines myth 
as "an ill-founded belief held uncritically especially by an interested group. " 
The interested group in educational R&D holds firmly the belief in the 
worth of adaptive instruction and their^elief is amply rewarded by puj^lic 
funds- However, the present study shows that for the seven program 
segments evaluated, none met each of the necessary requirements for 
adaptive materials. Of course, some. other set of material not tested may 
do so. This author believes that some material will prove adaptive; but, 
at present, this belief is still unfounded^ 

Nor does the experimental literature at present give foundation to 
the belief in the worth of adaptive instruction. A decade ago a review of 
studies of branching vs. linear programs failed to reveal advantages for 
braic^d programs (Holland, 1065). However, because of the lack of 
measurable' variables for characleristics of "branching ' these studies were 
unpersuasive, 

Another line of research, reviewed by pracht ( 1970), indicates a 
genera! lack t/f aptitude -by-treatment interaction^ The preferred treat- 
ment fceems not to change for swbjects, of different aptitude;' but the myth- 
bu.sting relevanre of this is Umil ed 'be. ^use adaptive instruction generally 
has not been concerned with adapting to aptitude differbnoes based oh norm- 
releroiKed tCbts but rather differences in achievement ba.sed on criterion- 
i-elfc rer.^ eci tt - ' s. 
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Developers* proceeded to produce adaptive materials despite 
the negative findings of nearly all research. This practice was at 
least excusable in view/)f the weakness of the research. The adaptive 
materials used in the research studies were also developed without 
explicit guidelines provided by measures for variables qf adapting. 
Researcb findings which seem to fail to support the theory of adapting 
to differences could result merely from a lack of materials which 
truly reflect that theory. 

Even though there is scant evidence for a specific advantage of an 

adaptive feature, many products of modern educational technology, which 

contain an adaptive feature, produce overall excellent results* IPI Math^ 

IS, Job Corps Advanced General Education Program, and the Programmed 

* 

Review s of Mathematics are examples of such materials. Proof of whether 
or hot the good performance of the better noaterials owes something to the 
adaptive feature awaits further evaluation; but the new products of 
educational technology involve many aspects which contrast with con- 
ventional practices. Usually the rigid, teacher -oriented classroom is 
gone; there is no more lock- step instruction whether or not diagnostic 
testing is used for presqribing instruction; there is more individual 
att^ii tion Crom teacher^ who stalk'^the classroom in search of praisable 
per,formance; and, perhaps most importantly, the teaching naaterials 
are often prepared following learning principles and behaviorally deter- 
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mined objectives. Therefore, diagnosing and adapting to individual 
differences is only one among miany ways that these materials differ 
from those of a decade ago^ But many of these factors, like adaptive- 
ness^ have raiely received clear sci^tlfic^pecifications and, as a 
consequence, the contributions of the various components liave seldom 
been evaluated. Adaptive instruction c'ould be important; some feature 
must be important; but as yet the faith in adaptive instruction is an 
unfounded belief. ' ^ ^ 
' A Solution: From Myth to Fact 

Simple -to -apply measures of the necessary charcteristic s for t 
adaptive material should help the developer generate good adaptive instruc 
tion worthy of the, as, yet, unsubstantiated acclaim such instruction has 
received. With, proper measurement and revision cycles there could 

4 ' * 

soon be efficient adaptive materials using tests of proven validity and 
discrimino-bility, ' * 

But, beyond the qucistion of adaptive features, many proclaimc:d ■ 
products of educational technolo^y have lacked adequate testing of the 
underlying assumptions. Work on these products has proceeded with- 
out explicit definition or nieasurenient methods for variable-s importi,nt 
to the T^reparation of curriculum niatorials* When measures of all key 
variables are developed, several k durational myths may turn to facts 
\^ i\i veluDers rc( Uie tools ior . r.uL-mentinjj; the much touted 

offering of technology, 
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Table 2 



Job Corps Adva^nced General Education 4^rogram 



Validity and Discriminability 



(1 Decision x 28 Subjects = 28 Total Decisions) ^ 



criterion 



4i P 
m 

o 



0 


JO 


18 


10 



Validity 



Hits ^ 



Hits & Misses 



Discriminability - 



Passes 



Passes & Failures 



Consequence' 



Teaching Time in Minutes _ 510 

Teaching Time & Testing Time -524 



Table 3 
Flexer & Flexer, Fractions 
First Evaluation - Long Form Tests 
^j^alidit v and Discriminability 

^ 

(3 IJecisions x 10 Siibjects = 30 Total Decisions) 



criterion 



O 
0 

m ' 



5 


0 


5 


20 



Validity = 



Hits 
Hits & Mi^es 



Discriminability = 



Passes 



Passes it Failures 



I 



Consequence 



Teaching Time in Minute s 150 

Teaching Time & Testing^Tin^e 184 



52 



Table 4 
Flexer & Flexer, Logarithms 



yilidity and Discriminability 
{I Decision x 28 Subjects - 28 fotal Decisions) 



critetian 



u 
o 

0 

m ^ 



2 


0 


CM 


24 



Validity 



Hits 



Hits Sc Misses 



- .93 



* s s 2 

Discriminability = passes & Failures " 28 




fconsequence 



T^ac hinu T ime in Minutes 
T^hing Time U -Testing Time 



76 




Table 5 

Flexer & Flexer, Fra^ctions 

o 

Second Evaluation 



53 



0 



A* Long -form tests . 

Validity and Discriminability 

(3 J^cisions x 10 Subjects = 30* Total Decisions) 



criterioii 



Hits 



*j P 
m 

o 

m ' 

*^ 



4 


1 , 


, 4 


21 



" . Hits & Misses 



25 
30 



• .83 



Passes 



Discriminability =^a8ses & Fallur^: 30 



- .17 



CSnsequence 



Teaching Time in Klinutes 

Teaching Time &" Testing Time 



m - .83 

IJl , 



S^gle-item tests ' 
^ Validity and Discriminability ^ 
(3 Decisions x 10 Subjects = 30' Total Decisions) 

criterion • ^ils 



o 
d 

6(i p 



15 


3 


5 .. 


7 



Validity = 



Hits- & blisses 



22 
30 



icy / 



• 73 



C Q n s e q ue ri c e 



^ Failures I?. 

Discrmnnabrm.y- p^.^^^s Failures ' 30" 



Teachin g tim^ in Minutes . 150 , ,,90 

Teafhing Time & Testing Time 156 



■*A0 



54 
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Table 6 

IPI Math, Multiplication {Level B) 
Placement •--♦CET^s 



Validity , and Discriminability 

(4 De.cisions x 22 Subjects = 88 Total Decisions) 



c 

u 



criterion 

F 



o 
d 



20 


24 




38 



Validity = 



Hits 



Hits & Misses 



\ 



Discriminability - 



58 
88 



rr • .66 



Failures 44 



"Passes k Failures 88 



• 50 



Consequence 



( Teaching k Pretest CET's &■• Po'sttest items) . 375. 
Consequence ;tem.s k Pl^cqment leAt items ' -■• 38'0 
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Table 7 

-IPI Math, Multiplicatioa (Level B) 
Placement + Pretest — ►CET's 



0 

55 * 



Validity and Discriminability ^ 
Diagnostic Prescription 



Criterion Hit or Miss Frequency 



Placement 




Pre -test 


r 


CET's 






Pass. 


+ 


Pass* 


skip 


Pass 


hit 


17 


Pass 


+ 


Pass* 


skip 


Fail 


miss 


5 


Pass 




• JFail* 


skip 


Pass 


hit 


3 


Pass 


+ 


Fail* 


skip ' 


Fail. 


miss 


. 19. 


Fail 


+ 


Pass 


skip 


Pass' 


hit 


'5 


Fail 




Pass 


skip 


- Fail 


'miss 


4 


Fail' 


4 


• Fail 


^ take 


Pass 


mi^^ 


1 


Fall 




Fail 


take 


Fail 


hit 


34 



Validity 



■ Discriminability 



Hits 



Hits + Misses 



Total Decisions 88 



59 .^ 



res- 



Passes 4* Failures 



35 
88 



- ,40 



C onsequence 



Consequence items (CET's 4 Posttest -f teaching) ^ 36 2 
Consequence l^ems -f Placement + Pretests ^ 37^ 



.93 



*In normal test procedure these tests would not be given because 
the placement test directed the units tb be skipped. 
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Table ,8 • 

IPI Math, Multiplication (Level B). 

Pretests — ^CET's 
Pretests ^osttest 

. Validity and Discriminabilit^ 

(4 Decisions x 22 Subjects =\88 Total Decisions) 

(.CET's) , ^ 

criterion 



U p 

ill W 

,Z o ' 
^ to p 



0). *^ 
4) o 



22 


9 




53 


(Posttests) 

cri:t.erion 

P \ F 


25 


6 




51 



Validity = 



Hits 



Hits & Misses 



11 
88 



- .85 



Hits 



Validity = j^^^g ^ Misses 88 



'4 - .86 



Discriminability -= 



Passes 



Passes & Failures 88 



31 



Cansequenco 



\ 



\ 



Con sequence it^ms (Toach :ng ii. CET's Postlgst) 
Consequence iterris Pretest items 



353 
37 5 



- .94 



ERIC 



Table 9 

Individualized Science, Hooke Unit 



Validity and Discriminability 

(5 Decisions x 10 Subjects = SO^Total Decisions) 



57 



criterion 



u P 
O 



m 



'0 


0 


1 


49 



Hits . 

= Hits & Misses 



Passes 



ii. .98 

50 



Discriminability = Passes k Failures 50 



Consequence 



Teaching^ Time' in Minutes 

Teaching Time & TestiVig Time 



Mi - .91 

127- 
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Table 10 
Inductive Reasoning 

Validity and Discriminability 

(7 Decisions x 11 Subjects = 7.7 Total Decisions) 



criterion 



u P 

O 
G 

to V 
m ' 



26 


26 


15 


10 



Validity 



Hits 



Hits & Misses 



» Fail ures 

Discriminability p^gges & Failures 



Consequence 



Teaching Items 



H Teaching Items 8c Testing Items 263 



Table 11 
A Tutor Text or\ ♦ . . 
the Arithmetic of Computers 



Validity and Discriminability 

(9 Subjects X 7 Decisions = 63 Total Decisions*) 



criterion 



u P 

m « ' 

o 

P 

m ' 



52 


0 


2 


5 



Validity :* 



Hits 



Hits & Missea 



Discriminability = 



Failure^ 



57 



Passes & Failures 



59 



- .12 



Consequence 
Consequence ratio = 

Consequence ratio = 



Consequence 
Total 

Teaching 
Total 



1559 
1125 



- .75 



.65 



=«=Four incompleted items lowered the actual number to 59 total decisions 



t 
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„ Table 12" 
Atkinson, Reading Prograna 

t 

t 

VaUditv and Discrimlnability 

(9 Decisions x 9 Subjects = 81 Total Decisions) 



criterion 



u P 
i) 

* o 
d 



38 


/ 
10 


16 


17 



sses 



Discriminability = 



Failures 



55 
81 



- 4r - .68 



Passes %L Failures 



33 
81 



- .37 



Consequence 



Teachipfi Items .21 

Teaching- Items- & Testing Items 30 



- .70 
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Figure Captions 

Figure 1. Adapting structure used In the Job 
-Corps Advanced Geheral Education Program. A score of 
851 or better on a unit screening test enables the student 
to skip all of the lessons under the catchment area of the 
screening test. In Level II, the number of lessons per 
unit ranges from 2-12. In the figure, unit screening tests 
are depicted as rectangles, lessons as squares, and unit 
posttests as diamonds. 

Figure 2. Branching -tree for the binary search 

procedure showing the branching sequences which determine 

the various entry points into the linear sequence shovn in 

the column at the extreme right of- the figure. Each sequence. 

begins with item 128 shown at the extreme left of the figure, 

proceeds upward to item 192 after a correct response or 

downward to item 64 after an incorrect response; This 

procedure repeats six times bisecting successively smaller 
\ 

Intervals until an entry point in the* last ^column is • 
reached* . ' < 
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